Building Intelligent Systems for Mining Information Extraction Rules from Web Pages by Using Domain Knowledge

نویسندگان

  • Heekyoung Seo
  • Jaeyoung Yang
  • Joongmin Choi
چکیده

Previous researches on automatic information extraction experienced difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a method of building intelligent systems for mining information extraction rules from semi-structured Web pages by using domain knowledge. This system automatically generates a wrapper for each information source and performs information extraction and information integration by applying this wrapper to the corresponding source. Both the domain knowledge and the wrapper are represented by XML documents to increase flexibility and interoperability. By testing our prototype system on several real-estate information sites, we can claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction for heterogeneous information sources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Automated Information Extraction using Amorphic

The Amorphic system is an adaptive web information extraction scheme for building intelligent systems for mining information from web pages. It can locate data of interest based on domain-knowledge or page structure, can automatically generate a wrapper for an information source, and can detect when the structure of a web-based resource has changed and act on this knowledge to search the update...

متن کامل

Journal of International Scientific Publications

In recent years, several approaches have been proposed to extract information from web pages on the internet. In this research, a key technique focused on crawling and ontology used to discover knowledge from web. In this paper, we present intelligent crawling system that uses pattern and ontology to extract particular information from WEB sites. The system developed as an efficient tool to con...

متن کامل

Automatic Rule Retrieval from Websites Using Ontologyand Text Mining

A Rule-based system like an intelligent service comparing portal may compare product prices, shipping options, refund options etc., Such rule based system requires an automatic knowledge acquisition procedure from the Web that consists of unstructured texts. Knowledge acquisition can be carried out by ontology acquisition and rule acquisition. Obtaining information such as product prices from w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001